Capturing Chaos: Rendering Handwritten Language Documents

نویسنده

  • John Henderson
چکیده

This paper demonstrates how the nature of a source language document, and the broad goals set for the usability of the content, can direct the process of creating digital language documentation from that source. Gerhardt Laves’s handwritten 1931 field notes on Noongar language and culture of southwestern Australia were retranscribed using an XML markup scheme and processed in various ways using XSLT. The central goals were to produce usable resources for community language activities and for linguistic and other scholarly analysis. A specific requirement for a rough facsimile representation, in recognizing that some of the graphic form of the notes was content that should be represented in the markup, contributed significantly to the specification of the markup scheme. Consultation with the Noongar community led to the recognition of Noongar families’ rights in the materials and the recognition of culturally sensitive content, which together led to a requirement for multiple versions with varying content. The general nature of these handwritten notes also raises important issues of reliability and attribution that must be handled in the markup scheme.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Connected Component Based Word Spotting on Persian Handwritten image documents

Word spotting is to make searchable unindexed image documents by locating word/words in a doc-ument image, given a query word. This problem is challenging, mainly due to the large numberof word classes with very small inter-class and substantial intra-class distances. In this paper, asegmentation-based word spotting method is presented for multi-writer Persian handwritten doc-...

متن کامل

Indexing and Retrieval of On-line Handwritten Documents

Recent advances in on-line data capturing technologies and its widespread deployment in devices like PDAs and notebook PCs is creating large amounts of handwritten data that need to be archived and retrieved efficiently. Word-spotting, which is based on a direct comparison of a handwritten keyword to words in the document, is commonly used for indexing and retrieval. We propose a string matchin...

متن کامل

Natural Language Inspired Approach for Handwritten Text Line Detection in Legacy Documents

Document layout analysis is an important task needed for handwritten text recognition among other applications. Text layout commonly found in handwritten legacy documents is in the form of one or more paragraphs composed of parallel text lines. An approach for handwritten text line detection is presented which uses machinelearning techniques and methods widely used in natural language processin...

متن کامل

Radial Line Fourier Descriptor for Segmentation-free Handwritten Word Spotting

Automatic recognition of historical handwritten manuscripts is a daunting task due to paper degradation over time. Recognition-free retrieval or word spotting is popularly used for information retrieval and digitization of the historical handwritten documents. However, the performance of word spotting algorithms depends heavily on feature detection and representation methods. Although there exi...

متن کامل

Documents Are Not Just for Printing

Modern use has emphasized the written/printed nature of documents to the extent that today documents imply writing on paper. This trend is well reeected by document processing systems that can produce high quality paper output. The world of electronic documents currently abounds with a plethora of markup languages, document encodings and typesetting systems designed to facilitate the production...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2008